Implement the parser

In this part of the assignment, we will implement the lexer and parser for our regular-expression compiler.

1 Grammar definition

We will implement a parser for the following grammar, which describes the concrete syntax of regular expressions.

\[ \begin{align} \gnt{Regex} \rightarrow\quad& \gt{0} & \text{empty set}\\ \ |\ \quad& \gt{3} & \text{epsilon}\\ \ |\ \quad& \gt{c}&\text{where } \mathrm{c} \in \Sigma \\ \ |\ \quad& \gnt{Regex} \gt{.} \gnt{Regex} & \text{concatenation}\\ \ |\ \quad& \gnt{Regex} \gt{*} & \text{Kleene star}\\ \ |\ \quad& \gt{!} \gnt{Regex}& \text{negation (not)}\\ \ |\ \quad& \gnt{Regex} \gt{\&} \gnt{Regex} & \text{conjunction (and)}\\ \ |\ \quad& \gnt{Regex} \gt{|} \gnt{Regex} & \text{disjunction (choice)}\\ \ |\ \quad& \gt{(} \gnt{Regex} \gt{)} & \text{parenthetical} \end{align} \]

A few notes about the grammar:

  • \(\gt{Terminals}\) are blue and underlined.
  • \(\gnt{Nonterminals}\) are italicized.
  • We restrict the alphabet, \(\Sigma\), to be the lower case letters a–z.
  • The concrete syntax for the empty string, \(\epsilon\), is the number \(\gt{3}\)
  • The concrete syntax for the empty set, \(\emptyset\), is the number \(\gt{0}\)
  • Concatenation is explicit, so the concrete syntax for the regular expression \(ab\) is \(\gt{a}\gt{.}\gt{b}\)
  • Whitespace is insignificant in our concrete syntax.

Ambiguity

Our grammar is ambiguous because it does not indicate precedence or associativity. We want an unambiguous grammar, so we will use the table below to define precedence and associativity for the operators in our language.

Operator Associativity Notes
\(\gt{.}\) left Highest precedence
\(\gt{\&}\) left
\(\gt{|}\) left
\(\gt{*} \gt{!}\) (none) Lowest precedence

2 Implement the parser

Define a data structure for tokens in src/Tokens.hs. Then, fill in src/Lexer.x and src/Parser.y, to implement the parser.

TipTests

Consider implementing some tests for the lexer and parser, before implementing the lexer and parser themselves! There are starter files in the test directory.

Tip

I recommend not worrying about precedence and associativity, at first.

Implement the grammar as defined, then generate it. At that point, there should be shift/reduce conflicts. Fix them using Happy’s precedence and associativity rules.

Warning

In the table above, the precedence of the operators is given in descending order (highest-precedence first). In Happy, it is the opposite: we list the rules in ascending order (lowest-precedence first).

NoteResources

In addition to the resources listed earlier in the assignment, Happy’s documentation for precedence and associativity rules will be helpful.

Once you have implemented the lexer and parser, the code should build and not mention parser conflicts during the build process.

Now, we just need to hook into the phases of our compiler!